Skip to content

fix(wgpu): avoid llvmpipe fallback on hybrid GPU Wayland setups with stale surfaces#263

Open
glima wants to merge 3 commits intopop-os:masterfrom
glima:fix-wgpu-adapter-stale-surface
Open

fix(wgpu): avoid llvmpipe fallback on hybrid GPU Wayland setups with stale surfaces#263
glima wants to merge 3 commits intopop-os:masterfrom
glima:fix-wgpu-adapter-stale-surface

Conversation

@glima
Copy link
Copy Markdown

@glima glima commented Feb 21, 2026

Problem

On Wayland with hybrid GPU laptops (e.g. NVIDIA + Intel), COSMIC panel applets consistently fall back to llvmpipe (CPU software rasterizer) instead of using the real GPU.

The root cause is a race condition during compositor initialization: the Wayland surface is not yet fully committed when wgpu probes adapter compatibility, causing VK_ERROR_SURFACE_LOST_KHR in vkGetPhysicalDeviceSurfaceCapabilitiesKHR. This falsely rejects real GPU adapters while llvmpipe survives (its Vulkan implementation doesn't perform the same surface validation).

This is particularly common with layer-shell panel applets, where the surface lifecycle differs from regular windows.

Observed behaviour

ERROR wgpu_hal::vulkan::adapter: get_physical_device_surface_capabilities: ERROR_SURFACE_LOST_KHR

Selected: AdapterInfo {
    name: "llvmpipe (LLVM 21.1.8, 256 bits)",
    device_type: Cpu,
    ...
}

Despite an NVIDIA T550 (discrete) and Intel Iris Xe (integrated) being available.

Fix

This PR contains two independent, incremental commits:

Commit 1: Trust DMA-BUF device preference without probing surface

When the Wayland compositor provides DMA-BUF feedback with vendor/device IDs that match an enumerated adapter, select it directly without the redundant is_surface_supported() probe. The DMA-BUF feedback already guarantees which device the compositor wants, and the surface will be (re-)created on the chosen adapter later anyway.

Commit 2: Retry adapter selection without surface when only CPU adapter found

When request_adapter() returns only a CPU software rasterizer, assume that real GPUs were falsely rejected by the stale surface and retry without the compatible_surface constraint. This lets wgpu's power-preference ranking pick the best real GPU available.

If the surface-less retry also yields nothing, the code falls back to whatever the original surface-constrained attempt returned (possibly the CPU adapter), so behaviour is never worse than before.

Edge cases considered

  • Systems with only a CPU adapter (e.g. VMs without GPU passthrough): the retry produces the same CPU result and it is used as expected.
  • Systems where DMA-BUF feedback is unavailable (e.g. layer-shell applets where get_wayland_device_ids returns None): Commit 1 is skipped, Commit 2 handles the fallback.
  • Systems where the surface is valid: request_adapter returns a real GPU on the first try, the CPU check is false, and the existing code path is taken with no behavioural change.

Testing

Tested on a hybrid GPU laptop (NVIDIA T550 + Intel Iris Xe, driver 580.119.02, Mesa 25.3.5) running COSMIC desktop on Wayland. Before this fix, the applet always selected llvmpipe. After:

WARN  iced_wgpu::window::compositor: adapter selection: surface-compatible pick is
      llvmpipe (LLVM 21.1.8, 256 bits); retrying without surface constraint

Selected: AdapterInfo {
    name: "NVIDIA T550 Laptop GPU",
    device_type: DiscreteGpu,
    backend: Vulkan,
    ...
}

@glima glima force-pushed the fix-wgpu-adapter-stale-surface branch from 065f615 to 32b61d6 Compare March 8, 2026 07:36
@glima glima force-pushed the fix-wgpu-adapter-stale-surface branch 2 times, most recently from e348103 to 57d6946 Compare April 10, 2026 23:42
glima and others added 3 commits April 22, 2026 21:37
On Wayland with hybrid GPU setups (e.g. NVIDIA + Intel), the surface
may not be fully committed when wgpu probes adapter compatibility
during compositor initialization.  This causes
VK_ERROR_SURFACE_LOST_KHR in vkGetPhysicalDeviceSurfaceCapabilitiesKHR,
which falsely rejects the correct GPU and forces a fallback to llvmpipe.

The DMA-BUF feedback from the Wayland compositor already guarantees
which device should be used for rendering.  When vendor/device IDs from
DMA-BUF match an enumerated adapter, select it directly without the
redundant (and race-prone) surface compatibility probe.  The surface
will be (re-)created on the chosen adapter later anyway.
…ter found

When request_adapter() returns only a CPU software rasterizer (e.g.
llvmpipe), it is likely that real GPU adapters were falsely rejected by
a stale Wayland surface.

This commonly happens with layer-shell panel applets, where the surface
is not yet fully committed when the compositor initialises the wgpu
backend.  The Vulkan surface capability query fails with
VK_ERROR_SURFACE_LOST_KHR for real GPUs, but software rasterizers
survive because they do not perform the same surface validation.

Retry adapter selection without the compatible_surface constraint so
that wgpu's power-preference ranking can pick a real GPU (discrete or
integrated).  If the surface-less retry also yields nothing, fall back
to whatever the original surface-constrained attempt returned (possibly
the CPU adapter), so behaviour is never worse than before.

On a system with only a CPU adapter (e.g. a VM with no GPU passthrough)
the retry produces the same result and the CPU adapter is used as
expected.
…t widget

Three bugs prevented the List widget from working correctly:

1. Lazy iterator silently dropped all events (.map → .for_each)

   The event dispatch loop used `.map()`, which produces a lazy
   iterator that was never consumed.  No click, keyboard, or any
   other event was ever forwarded to child widgets.

2. Visible elements empty during input events (hoist repopulation)

   `visible_elements` lives on the `List` struct, which is recreated
   every `view()` call (always starts empty).  The repopulation from
   `state.visible_layouts` only ran inside the `RedrawRequested`
   branch.  In winit's event loop, input events fire *before*
   `RedrawRequested`, so clicks arriving in the first event batch
   after a view rebuild found an empty element list and were lost.

   Fix: repopulate `visible_elements` from persisted state at the top
   of `update()`, before the dispatch loop.

3. Width reported as Shrink instead of Fill

   Both `size()` and `layout()` used `Length::Shrink` for width,
   causing the List to collapse to intrinsic content width rather
   than filling available horizontal space.  Parent containers
   (scrollables, etc.) would not allocate full width to it.
@glima glima force-pushed the fix-wgpu-adapter-stale-surface branch from 57d6946 to 94f845f Compare April 23, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant